Skip to content

prov/lnx: add FI_MSG and FI_RMA support#12209

Open
aingerson wants to merge 9 commits into
ofiwg:mainfrom
aingerson:lnx2
Open

prov/lnx: add FI_MSG and FI_RMA support#12209
aingerson wants to merge 9 commits into
ofiwg:mainfrom
aingerson:lnx2

Conversation

@aingerson
Copy link
Copy Markdown
Contributor

This is on top of the refactor in #12188
Fixes various bugs in lnx in addition to adding FI_MSG and FI_RMA support. Opening up for CI testing and initial comments but this is not finalized. There are still some lingering holes (for example supporting FI_MR_VIRT_ADDR properly)

@aingerson
Copy link
Copy Markdown
Contributor Author

@amirshehataornl @jfillers FYI this is what I have so far

aingerson added 9 commits May 12, 2026 10:23
Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
The current method for registering MRs with the core providers
works if there is only one domain or if the domains can somehow
use each other's keys but the keys for the domains could be different
and, since there is only one stored core mr fid, lnx will always use
the mr fid from the first domain it was used on.

Change the core mr fids into an array so we can register on every
domain. The ep/domain will contain the index so we can make sure
to register it on and return the correct core fid.

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
lnx was just taking the first iov/descriptor but advertising
support for multiple IOVs.
Support for multiple IOVs requires translating the array of descriptors
into an array of core provider descriptors

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
We shouldn't be relying on global resources. There's no reason
to have the entries come from different locations. We can just
use the lep receive bufpool. We also don't need a separate
lock for accessing the bufpool; we can just use the util_ep lock
which has the bonus of being able to be optimized out when not
necessary

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
Add support for the FI_RMA APIs. This is done by requiring
FI_MR_RAW if FI_RMA support is requested. The keys for all
underlying core providers are stored in an array (accessed
by domain index) so the application key is 8 * num_domains
(thus requiring the larger key).
The app will exchange the raw key and then map it on the
remote side to get a local uint64_t key for use in the RMA
calls. This key will be a pointer to an internal structure
(lnx_mr_key) which will hold all the core provider keys for
use in the actual RMA calls

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
Add FI_MSG support by using (existing) regular message queues

Consolidates and refactors some code to be used in both sets of
functions

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
Ubertest was sometimes skipping the MR raw attr/map steps for FI_MR_RAW
causing a map failure with providers that required the mapping

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
This lets the fi_av_xfer test pass which was failing on reinsert
because the buffer was already allocated and could not be allocated
for the re-insert

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
Remove exclusions for tests now valid with addition of FI_MSG and FI_RMA
Add fi_ubertest configurations for new functionality

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
@aingerson
Copy link
Copy Markdown
Contributor Author

@jfillers I'm going to be on vacation for the next week and a half. I've enabled all our CI testing for fabtests lnx with verbs+shm and tcp+shm and everything looks good. Can you test with all your use cases and see how it does and review with any issues you have?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant